Objects in R

2023 Bio R Workshop

Author

Prof. Rey R. Cuenca
Math-Stat Dept., MSU-IIT

An Intuitive Framework

One approach to get a partial yet quick understanding of a complex system of ideas is to have a simplified mental picture of it. This same approach is applied when we want to learn R as its learning is quite steep:

The learning curve for R programming is steep due to its unique syntax and extensive set of commands, requiring most new learners to spend four to six weeks mastering it.” - Noble Desktop, (NYC’s Top Design & Coding School Since 1990)

A (over)simplified mental picture for beginners of R is to analogize working in R as cooking. Cooking essentially requires three things:

  1. Ingredients – R objects a.k.a “data containers”
  2. Cooking utensils/equipments – R functions
  3. Recipe – R scripts or Markdown files

Figure 1: Mental picture when working with R

You can think of RStudio’s Console and Source Panes as the “chef’s” (you) cooking table.

Vectors

Probably the most fundamental object that act as “data container” (i.e. data structure) in R is called a vector (also called atomic vectors). Almost all other objects in R that are used by the common user is built up in terms of vectors. Any vector contains three properties:

  1. Type - typeof(), what it is
  2. Length - length(), how many elements it contains
  3. Attributes - attributes(), additional arbitrary metadata

Creating vectors could be done in many ways. However, two of most basic ways depends on the length of the vector:

  1. Length = 1. Directly run a single alphanumeric characters in the Console Pane.
  2. Length > 1. Use the R combine command c().

Characters or Strings

"a"
"a,    b,c"
c("a","b","c")
typeof("a")
typeof("a,    b,c")
typeof(c("a","b","c"))
length("a")
length("a,    b,c")
length(c("a","b","c"))

Numbers

15L
1.0
1 + 2i

c(1L,2L,0L,-15L)
c(1.0,1,4,6,-56,1e-10,1e4)
c(1 + 2i,1,0 - 3i, 3i)
typeof(15L)
typeof(1.0)
typeof(1 + 2i)

typeof(c(1L,2L,0L,-15L))
typeof(c(1.0,1,4,6,-56,1e-10,1e4))
typeof(c(1 + 2i,1,0 - 3i, 3i))
length(15L)
length(1.0)
length(1 + 2i)

length(c(1L,2L,0L,-15L))
length(c(1.0,1,4,6,-56,1e-10,1e4))
length(c(1 + 2i,1,0 - 3i, 3i))

Logical or Boolean

T
F
TRUE
FALSE
c(T,FALSE)
c(T,T,T,T,F,FALSE,F,TRUE,T,FALSE,T)
typeof(c(T,FALSE))
length(c(T,FALSE))
attributes(c(T,FALSE))

Matrix

# Number of entries matches number of elements
matrix(c(1,2,3,4,5,6,7,8), nrow = 2, ncol = 4)
matrix(c(1,2,3,4,5,6,7,8), nrow = 2, ncol = 4, byrow = TRUE)

# Number of entries does not matche number of elements
# Resolved by recycling elements
matrix(c(1,2,3,4,5,6,7,8), nrow = 2, ncol = 10)
matrix(c(1,2,3,4,5,6,7,8), nrow = 2, ncol = 13, byrow = TRUE)
## Example of setting row and column names
matrix(data = c(1,2,3, 11,12,13), 
       nrow = 2, 
       ncol = 3, 
       byrow = TRUE,
       dimnames = list(c("row1", "row2"),
                       c("C.1", "C.2", "C.3")))
cbind(c(1,2,3,4), c(5,6,7,8))
rbind(c(1,2,3,4), c(5,6,7,8))
cbind(c(1,2,3,4),
      c(5,6,7,8), 
      c("A","B","C","D"))

rbind(c(1,2,3,4),
      c(5,6,7,8),
      c(T,F,T,T))

rbind(c(143,243),
      cbind(c(5,6,7,8), 
            c(T,F,T,T)))

Data Frame

data.frame(
  ID = c(1103,1483,5670),
  Name = c("Mark","John","Maria"),
  Age = c(15L,13L,16L),
  BType = c("A","O","B"),
  WVaccine = c(T,T,F)
)
    ID  Name Age BType WVaccine
1 1103  Mark  15     A     TRUE
2 1483  John  13     O     TRUE
3 5670 Maria  16     B    FALSE
dplyr::tibble(
  ID = c(1103,1483,5670),
  Name = c("Mark","John","Maria"),
  Age = c(15L,13L,16L),
  BType = c("A","O","B"),
  WVaccine = c(T,T,F)
)
# A tibble: 3 × 5
     ID Name    Age BType WVaccine
  <dbl> <chr> <int> <chr> <lgl>   
1  1103 Mark     15 A     TRUE    
2  1483 John     13 O     TRUE    
3  5670 Maria    16 B     FALSE   

Lists

A list a vector in “steroids”. While vectors only allows a single type (logical, numeric, etc) of data, lists allows a mixture of different types of data. In other words, a vector is homogeneous type of container while lists is the heterogeneous type.

c(1,2,3)
list(1,2,3)
c(1,"A",TRUE,c(5.4,-4.0))
list(1,"A",TRUE,c(5.4,-4.0))
typeof(list(1,"A",TRUE,c(5.4,-4.0)))
length(list(1,"A",TRUE,c(5.4,-4.0)))
attributes(list(1,"A",TRUE,c(5.4,-4.0)))
list(Name1 = 1, Name2 = "A", Name3 = TRUE, Name4 = c(5.4,-4.0))
typeof(list(Name1 = 1, Name2 = "A", Name3 = TRUE, Name4 = c(5.4,-4.0)))
length(list(Name1 = 1, Name2 = "A", Name3 = TRUE, Name4 = c(5.4,-4.0)))
attributes(list(Name1 = 1, Name2 = "A", Name3 = TRUE, Name4 = c(5.4,-4.0)))
list(Name1 = 1,
     Name2 = "A",
     Name3 = TRUE,
     Name4 = c(5.4,-4.0))
list("Name 1" = 1,
     "Name 2" = "A",
     "Name 3" = TRUE,
     "Name 4" = c(5.4,-4.0))
list(`Name 1` = 1,
     `Name 2` = "A",
     `Name 3` = TRUE,
     `Name 4` = c(5.4,-4.0))
list(
  `A vector` = 1:10,
  `A matrix` = matrix(1:9, nrow = 3),
  `A list` = list(Name1 = 1, 
                        Name2 = "A", 
                        Name3 = TRUE, 
                        Name4 = c(5.4,-4.0))
)

Variables and Constants

In computer programming, a variable is a named memory location where data is stored. Constants are those entities whose values aren’t meant to be changed anywhere throughout the code

x <- c(5,19,-2,0)
x
typeof(x)
length(x)
HONEY <- list(1,"A",TRUE,c(5.4,-4.0))
HONEY
typeof(HONEY)
length(HONEY)
student_data <- data.frame(
                  ID = c(1103,1483,5670),
                  Name = c("Mark","John","Maria"),
                  Age = c(15L,13L,16L),
                  BType = c("A","O","B"),
                  WVaccine = c(T,T,F)
                )
student_data
typeof(student_data)
length(student_data)
attributes(student_data)

The variables x, HONEY, and student_data are stored in the Global Environment through the Environment Pane:

You also list down all the existing variables you have stored in the Global Environment using the ls() command:

ls()

There are certain rules that need to be followed while creating a variable and constants:

  • A variable name in R can be created using letters, digits, periods, and underscores.

  • You can start a variable name with a letter or a period, but not with digits.

  • For multi-word variable names, it is advised to underscores in place of spaces. For example, first_name, student_id, etc.

  • If a variable name starts with a dot, you can’t follow it with digits.

  • R is case sensitive. This means that age and Age are treated as different variables.

  • We have some reserved words that cannot be used as variable names. These are names that are built-in R and changing them leads to “horrifying” consequences. You are warned!

Special and Built-in R Constants

Special R Constants:

  • NULL – to declare an empty R object.

    x <- NULL
    x
    x <- c(5,NULL,-6)
    x
  • Inf / -Inf – represents positive and negative infinity or numbers that exceeds the capacity of the machine.

    Inf
    -Inf
  • NaN (Not a Number) – represents undefined numerical value like 0/0 or Inf/Inf .

    NaN
    0/0
    Inf/Inf
  • NA (Not Available) – represents values which is not available.

Built-in R Constants:

  • LETTERS – the 26 upper-case letters of the Roman alphabet

    LETTERS
  • letters – the 26 lower-case letters of the Roman alphabet

    letters
  • month.abb – the three-letter abbreviations for the English month names

    month.abb
  • month.name – the English names for the months of the year

    month.name
  • pi – the constant \(\pi=3.1415927\ldots\), i.e. the ratio of the circumference of a circle to its diameter

    pi